-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[MachineLICM] Fine tune getRegPressureSetLimit #156173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are two API of getRegPressureSetLimit() in backend. One is provided by TargetRegisterInfo which return the RegPressureSetLimit that is determined by specific target without considering the reserved registers. The other is provided by RegisterClassInfo which is based on TargetRegisterInfo::getRegPressureSetLimit and is adjusted dynamically for reserved registers. Most backend pass (e.g., scheduler) use TargetRegisterInfo::getRegPressureSetLimit. However MachineLICM still use TargetRegisterInfo::getRegPressureSetLimit which is not accurate. This patch replaces the API TargetRegisterInfo::getRegPressureSetLimit with TargetRegisterInfo::getRegPressureSetLimit in MachineLICM pass.
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-globalisel Author: Luo, Yuanke (LuoYuanke) ChangesThere are two API of getRegPressureSetLimit() in backend. One is provided by Patch is 6.91 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156173.diff 63 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 286fbfd373b59..f1c14ff0045a3 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -396,13 +396,15 @@ bool MachineLICMImpl::run(MachineFunction &MF) {
LLVM_DEBUG(dbgs() << MF.getName() << " ********\n");
if (PreRegAlloc) {
+ RegisterClassInfo RegClassInfo;
+ RegClassInfo.runOnMachineFunction(MF);
// Estimate register pressure during pre-regalloc pass.
unsigned NumRPS = TRI->getNumRegPressureSets();
RegPressure.resize(NumRPS);
llvm::fill(RegPressure, 0);
RegLimit.resize(NumRPS);
for (unsigned i = 0, e = NumRPS; i != e; ++i)
- RegLimit[i] = TRI->getRegPressureSetLimit(MF, i);
+ RegLimit[i] = RegClassInfo.getRegPressureSetLimit(i);
}
if (HoistConstLoads)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
index 666523c88860c..39c5b4d5a4741 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
@@ -330,13 +330,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX942-NEXT: global_load_dword v3, v[0:1], off
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v2
; GFX942-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: v_mov_b32_e32 v5, v3
-; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX942-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX942-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v4, v3
; GFX942-NEXT: buffer_wbl2 sc1
; GFX942-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off sc0
; GFX942-NEXT: s_waitcnt vmcnt(0)
@@ -375,13 +375,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_load_dword v3, v[0:1], off
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v2
; GFX90A-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v5, v3
-; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX90A-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX90A-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v4, v3
; GFX90A-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
@@ -399,13 +399,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: global_load_dword v3, v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v2
; GFX908-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v4, v3
-; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
-; GFX908-NEXT: v_max_f32_e32 v3, v3, v2
+; GFX908-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX908-NEXT: v_max_f32_e32 v5, v4, v4
+; GFX908-NEXT: v_max_f32_e32 v3, v5, v3
; GFX908-NEXT: global_atomic_cmpswap v3, v[0:1], v[3:4], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
@@ -475,21 +475,21 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX942-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX942: ; %bb.0:
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX942-NEXT: global_load_dword v3, v[0:1], off
+; GFX942-NEXT: global_load_dword v5, v[0:1], off
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v4, v2, v2
; GFX942-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0)
-; GFX942-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v4
+; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v3, v4
; GFX942-NEXT: buffer_wbl2 sc1
-; GFX942-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off sc0
+; GFX942-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off sc0
; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: buffer_inv sc1
-; GFX942-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX942-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
; GFX942-NEXT: s_or_b64 s[0:1], vcc, s[0:1]
-; GFX942-NEXT: v_mov_b32_e32 v3, v2
+; GFX942-NEXT: v_mov_b32_e32 v5, v3
; GFX942-NEXT: s_andn2_b64 exec, exec, s[0:1]
; GFX942-NEXT: s_cbranch_execnz .LBB5_1
; GFX942-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -519,20 +519,20 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX90A-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX90A: ; %bb.0:
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX90A-NEXT: global_load_dword v3, v[0:1], off
+; GFX90A-NEXT: global_load_dword v5, v[0:1], off
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v4, v2, v2
; GFX90A-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0)
-; GFX90A-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v4
-; GFX90A-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
+; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v3, v4
+; GFX90A-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
-; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX90A-NEXT: v_mov_b32_e32 v3, v2
+; GFX90A-NEXT: v_mov_b32_e32 v5, v3
; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX90A-NEXT: s_cbranch_execnz .LBB5_1
; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -542,20 +542,20 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX908-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT: global_load_dword v3, v[0:1], off
+; GFX908-NEXT: global_load_dword v4, v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v4, v2, v2
; GFX908-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
-; GFX908-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v4
-; GFX908-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
+; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
+; GFX908-NEXT: v_max_f32_e32 v5, v2, v2
+; GFX908-NEXT: v_max_f32_e32 v3, v3, v5
+; GFX908-NEXT: global_atomic_cmpswap v3, v[0:1], v[3:4], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
-; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v3, v4
; GFX908-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v3, v2
+; GFX908-NEXT: v_mov_b32_e32 v4, v3
; GFX908-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX908-NEXT: s_cbranch_execnz .LBB5_1
; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -608,15 +608,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[2:3], v[2:3]
; GFX12-NEXT: s_mov_b32 s0, 0
; GFX12-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX12-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
-; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[6:7], v[6:7]
-; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[4:5], v[2:3]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[2:3], v[2:3]
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[8:9], v[6:7], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[8:9], v[4:5]
; GFX12-NEXT: s_wait_storecnt 0x0
; GFX12-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
; GFX12-NEXT: s_wait_loadcnt 0x0
@@ -646,15 +646,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX11-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX11-NEXT: s_mov_b32 s0, 0
; GFX11-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX11-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX11-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX11-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off glc
; GFX11-NEXT: s_waitcnt vmcnt(0)
@@ -692,15 +692,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: global_load_dwordx2 v[4:5], v[0:1], off
-; GFX908-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX908-NEXT: s_mov_b64 s[4:5], 0
; GFX908-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v7, v5
; GFX908-NEXT: v_mov_b32_e32 v6, v4
-; GFX908-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX908-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX908-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX908-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX908-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX908-NEXT: global_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
@@ -718,15 +718,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: flat_load_dwordx2 v[4:5], v[0:1]
-; GFX8-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX8-NEXT: s_mov_b64 s[4:5], 0
; GFX8-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX8-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v7, v5
; GFX8-NEXT: v_mov_b32_e32 v6, v4
-; GFX8-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX8-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX8-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX8-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX8-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX8-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: buffer_wbinvl1
@@ -764,21 +764,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX12-NEXT: v_max_num_f64_e32 v[6:7], v[2:3], v[2:3]
+; GFX12-NEXT: global_load_b64 v[6:7], v[0:1], off
; GFX12-NEXT: s_mov_b32 s0, 0
; GFX12-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX12-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX12-NEXT: s_wait_loadcnt 0x0
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[4:5], v[4:5]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[6:7], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[8:9], v[2:3], v[2:3]
; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[2:3], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[4:5], v[8:9]
; GFX12-NEXT: s_wait_storecnt 0x0
-; GFX12-NEXT: global_atomic_cmpswap_b64 v[2:3], v[0:1], v[2:5], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX12-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: global_inv scope:SCOPE_DEV
-; GFX12-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[2:3], v[4:5]
-; GFX12-NEXT: v_dual_mov_b32 v5, v3 :: v_dual_mov_b32 v4, v2
+; GFX12-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[4:5], v[6:7]
+; GFX12-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
; GFX12-NEXT: s_wait_alu 0xfffe
; GFX12-NEXT: s_or_b32 s0, vcc_lo, s0
; GFX12-NEXT: s_wait_alu 0xfffe
@@ -801,22 +801,22 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX11-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX11-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX11-NEXT: global_load_b64 v[6:7], v[0:1], off
; GFX11-NEXT: s_mov_b32 s0, 0
; GFX11-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
+; GFX11-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX11-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
+; GFX11-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
-; GFX11-NEXT: global_atomic_cmpswap_b64 v[2:3], v[0:1], v[2:5], off glc
+; GFX11-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off glc
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: buffer_gl1_inv
; GFX11-NEXT: buffer_gl0_inv
-; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[2:3], v[4:5]
-; GFX11-NEXT: v_dual_mov_b32 v5, v3 :: v_dual_mov_b32 v4, v2
+; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[4:5], v[6:7]
+; GFX11-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
; GFX11-NEXT: s_or_b32 s0, vcc_lo, s0
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX11-NEXT: s_and_not1_b32 exec_lo, exec_lo, s0
@@ -846,21 +846,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX908-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT: global_load_dwordx2 v[4:5], v[0:1], off
-; GFX908-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX908-NEXT: global_load_dwordx2 v[6:7], v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
; GFX908-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
-; GFX908-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
-; GFX908-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
-; GFX908-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc
+; GFX908-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX908-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
+; GFX908-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
+; GFX908-NEXT: global_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
-; GFX908-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v5, v3
+; GFX908-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[6:7]
+; GFX908-NEXT: v_mov_b32_e32 v7, v5
; GFX908-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v4, v2
+; GFX908-NEXT: v_mov_b32_e32 v6, v4
; GFX908-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX908-NEXT: s_cbranch_execnz .LBB7_1
; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -870,21 +870,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX8-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_dwordx2 v[4:5], v[0:1]
-; GFX8-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX8-NEXT: flat_load_dwordx2 v[6:7], v[0:1]
; GFX8-NEXT: s_mov_b64 s[4:5], 0
; GFX8-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX8-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
-; GFX8-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
-; GFX8-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc
+; GFX8-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX8-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
+; GFX8-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
+; GFX8-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: buffer_wbinvl1
-; GFX8-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
-; GFX8-NEXT: v_mov_b32_e32 v5, v3
+; GFX8-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[6:7]
+; GFX8-NEXT: v_mov_b32_e32 v7, v5
; GFX8-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX8-NEXT: v_mov_b32_e32 v4, v2
+; GFX8-NEXT: v_mov_b32_e32 v6, v4
; GFX8-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX8-NEXT: s_cbranch_execnz .LBB7_1
; GFX8-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -925,13 +925,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX942-NEXT: flat_load_dword v3, v[0:1]
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v2
; GFX942-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NEXT: v_mov_b32_e32 v5, v3
-; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX942-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX942-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v4, v3
; GFX942-NEXT: buffer_wbl2 sc1
; GFX942-NEXT: flat_atomic_cmpswap v3, v[0:1], v[4:5] sc0
; GFX942-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -970,13 +970,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_load_dword v3, v[0:1]
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v2
; GFX90A-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v5, v3
-; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX90A-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX90A-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v4, v3
; GFX90A-NEXT: flat_atomic_cmpswap v3, v[0:1], v[4:5] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
@@ -994,13 +994,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: flat_load_dword v3, v[0:1]
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v2
; GFX908-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v4, v3
-; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
-; GFX908-NEXT: v_max_f32_e32 v...
[truncated]
|
@llvm/pr-subscribers-backend-amdgpu Author: Luo, Yuanke (LuoYuanke) ChangesThere are two API of getRegPressureSetLimit() in backend. One is provided by Patch is 6.91 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156173.diff 63 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 286fbfd373b59..f1c14ff0045a3 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -396,13 +396,15 @@ bool MachineLICMImpl::run(MachineFunction &MF) {
LLVM_DEBUG(dbgs() << MF.getName() << " ********\n");
if (PreRegAlloc) {
+ RegisterClassInfo RegClassInfo;
+ RegClassInfo.runOnMachineFunction(MF);
// Estimate register pressure during pre-regalloc pass.
unsigned NumRPS = TRI->getNumRegPressureSets();
RegPressure.resize(NumRPS);
llvm::fill(RegPressure, 0);
RegLimit.resize(NumRPS);
for (unsigned i = 0, e = NumRPS; i != e; ++i)
- RegLimit[i] = TRI->getRegPressureSetLimit(MF, i);
+ RegLimit[i] = RegClassInfo.getRegPressureSetLimit(i);
}
if (HoistConstLoads)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
index 666523c88860c..39c5b4d5a4741 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
@@ -330,13 +330,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX942-NEXT: global_load_dword v3, v[0:1], off
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v2
; GFX942-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: v_mov_b32_e32 v5, v3
-; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX942-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX942-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v4, v3
; GFX942-NEXT: buffer_wbl2 sc1
; GFX942-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off sc0
; GFX942-NEXT: s_waitcnt vmcnt(0)
@@ -375,13 +375,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_load_dword v3, v[0:1], off
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v2
; GFX90A-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v5, v3
-; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX90A-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX90A-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v4, v3
; GFX90A-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
@@ -399,13 +399,13 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: global_load_dword v3, v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v2
; GFX908-NEXT: .LBB4_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v4, v3
-; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
-; GFX908-NEXT: v_max_f32_e32 v3, v3, v2
+; GFX908-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX908-NEXT: v_max_f32_e32 v5, v4, v4
+; GFX908-NEXT: v_max_f32_e32 v3, v5, v3
; GFX908-NEXT: global_atomic_cmpswap v3, v[0:1], v[3:4], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
@@ -475,21 +475,21 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX942-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX942: ; %bb.0:
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX942-NEXT: global_load_dword v3, v[0:1], off
+; GFX942-NEXT: global_load_dword v5, v[0:1], off
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v4, v2, v2
; GFX942-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0)
-; GFX942-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v4
+; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v3, v4
; GFX942-NEXT: buffer_wbl2 sc1
-; GFX942-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off sc0
+; GFX942-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off sc0
; GFX942-NEXT: s_waitcnt vmcnt(0)
; GFX942-NEXT: buffer_inv sc1
-; GFX942-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX942-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
; GFX942-NEXT: s_or_b64 s[0:1], vcc, s[0:1]
-; GFX942-NEXT: v_mov_b32_e32 v3, v2
+; GFX942-NEXT: v_mov_b32_e32 v5, v3
; GFX942-NEXT: s_andn2_b64 exec, exec, s[0:1]
; GFX942-NEXT: s_cbranch_execnz .LBB5_1
; GFX942-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -519,20 +519,20 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX90A-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX90A: ; %bb.0:
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX90A-NEXT: global_load_dword v3, v[0:1], off
+; GFX90A-NEXT: global_load_dword v5, v[0:1], off
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v4, v2, v2
; GFX90A-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0)
-; GFX90A-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v4
-; GFX90A-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
+; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v3, v4
+; GFX90A-NEXT: global_atomic_cmpswap v3, v[0:1], v[4:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
-; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX90A-NEXT: v_mov_b32_e32 v3, v2
+; GFX90A-NEXT: v_mov_b32_e32 v5, v3
; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX90A-NEXT: s_cbranch_execnz .LBB5_1
; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -542,20 +542,20 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
; GFX908-LABEL: global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT: global_load_dword v3, v[0:1], off
+; GFX908-NEXT: global_load_dword v4, v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v4, v2, v2
; GFX908-NEXT: .LBB5_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
-; GFX908-NEXT: v_max_f32_e32 v2, v3, v3
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v4
-; GFX908-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
+; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
+; GFX908-NEXT: v_max_f32_e32 v5, v2, v2
+; GFX908-NEXT: v_max_f32_e32 v3, v3, v5
+; GFX908-NEXT: global_atomic_cmpswap v3, v[0:1], v[3:4], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
-; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v3, v4
; GFX908-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v3, v2
+; GFX908-NEXT: v_mov_b32_e32 v4, v3
; GFX908-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX908-NEXT: s_cbranch_execnz .LBB5_1
; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -608,15 +608,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[2:3], v[2:3]
; GFX12-NEXT: s_mov_b32 s0, 0
; GFX12-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX12-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
-; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[6:7], v[6:7]
-; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[4:5], v[2:3]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[2:3], v[2:3]
+; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX12-NEXT: v_max_num_f64_e32 v[8:9], v[6:7], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[8:9], v[4:5]
; GFX12-NEXT: s_wait_storecnt 0x0
; GFX12-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
; GFX12-NEXT: s_wait_loadcnt 0x0
@@ -646,15 +646,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX11-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX11-NEXT: s_mov_b32 s0, 0
; GFX11-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX11-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX11-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX11-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off glc
; GFX11-NEXT: s_waitcnt vmcnt(0)
@@ -692,15 +692,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: global_load_dwordx2 v[4:5], v[0:1], off
-; GFX908-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX908-NEXT: s_mov_b64 s[4:5], 0
; GFX908-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v7, v5
; GFX908-NEXT: v_mov_b32_e32 v6, v4
-; GFX908-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX908-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX908-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX908-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX908-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX908-NEXT: global_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
@@ -718,15 +718,15 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: flat_load_dwordx2 v[4:5], v[0:1]
-; GFX8-NEXT: v_max_f64 v[2:3], v[2:3], v[2:3]
; GFX8-NEXT: s_mov_b64 s[4:5], 0
; GFX8-NEXT: .LBB6_1: ; %atomicrmw.start
; GFX8-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v7, v5
; GFX8-NEXT: v_mov_b32_e32 v6, v4
-; GFX8-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
-; GFX8-NEXT: v_max_f64 v[4:5], v[4:5], v[2:3]
+; GFX8-NEXT: v_max_f64 v[4:5], v[2:3], v[2:3]
+; GFX8-NEXT: v_max_f64 v[8:9], v[6:7], v[6:7]
+; GFX8-NEXT: v_max_f64 v[4:5], v[8:9], v[4:5]
; GFX8-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: buffer_wbinvl1
@@ -764,21 +764,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX12-NEXT: v_max_num_f64_e32 v[6:7], v[2:3], v[2:3]
+; GFX12-NEXT: global_load_b64 v[6:7], v[0:1], off
; GFX12-NEXT: s_mov_b32 s0, 0
; GFX12-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX12-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX12-NEXT: s_wait_loadcnt 0x0
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[4:5], v[4:5]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[6:7], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[8:9], v[2:3], v[2:3]
; GFX12-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT: v_max_num_f64_e32 v[2:3], v[2:3], v[6:7]
+; GFX12-NEXT: v_max_num_f64_e32 v[4:5], v[4:5], v[8:9]
; GFX12-NEXT: s_wait_storecnt 0x0
-; GFX12-NEXT: global_atomic_cmpswap_b64 v[2:3], v[0:1], v[2:5], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
+; GFX12-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off th:TH_ATOMIC_RETURN scope:SCOPE_DEV
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: global_inv scope:SCOPE_DEV
-; GFX12-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[2:3], v[4:5]
-; GFX12-NEXT: v_dual_mov_b32 v5, v3 :: v_dual_mov_b32 v4, v2
+; GFX12-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[4:5], v[6:7]
+; GFX12-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
; GFX12-NEXT: s_wait_alu 0xfffe
; GFX12-NEXT: s_or_b32 s0, vcc_lo, s0
; GFX12-NEXT: s_wait_alu 0xfffe
@@ -801,22 +801,22 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX11-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT: global_load_b64 v[4:5], v[0:1], off
-; GFX11-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX11-NEXT: global_load_b64 v[6:7], v[0:1], off
; GFX11-NEXT: s_mov_b32 s0, 0
; GFX11-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX11-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
+; GFX11-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX11-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
+; GFX11-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
-; GFX11-NEXT: global_atomic_cmpswap_b64 v[2:3], v[0:1], v[2:5], off glc
+; GFX11-NEXT: global_atomic_cmpswap_b64 v[4:5], v[0:1], v[4:7], off glc
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: buffer_gl1_inv
; GFX11-NEXT: buffer_gl0_inv
-; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[2:3], v[4:5]
-; GFX11-NEXT: v_dual_mov_b32 v5, v3 :: v_dual_mov_b32 v4, v2
+; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[4:5], v[6:7]
+; GFX11-NEXT: v_dual_mov_b32 v7, v5 :: v_dual_mov_b32 v6, v4
; GFX11-NEXT: s_or_b32 s0, vcc_lo, s0
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX11-NEXT: s_and_not1_b32 exec_lo, exec_lo, s0
@@ -846,21 +846,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX908-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX908-NEXT: global_load_dwordx2 v[4:5], v[0:1], off
-; GFX908-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX908-NEXT: global_load_dwordx2 v[6:7], v[0:1], off
; GFX908-NEXT: s_mov_b64 s[4:5], 0
; GFX908-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0)
-; GFX908-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
-; GFX908-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
-; GFX908-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc
+; GFX908-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX908-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
+; GFX908-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
+; GFX908-NEXT: global_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7], off glc
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: buffer_wbinvl1
-; GFX908-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v5, v3
+; GFX908-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[6:7]
+; GFX908-NEXT: v_mov_b32_e32 v7, v5
; GFX908-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX908-NEXT: v_mov_b32_e32 v4, v2
+; GFX908-NEXT: v_mov_b32_e32 v6, v4
; GFX908-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX908-NEXT: s_cbranch_execnz .LBB7_1
; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -870,21 +870,21 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
; GFX8-LABEL: global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_load_dwordx2 v[4:5], v[0:1]
-; GFX8-NEXT: v_max_f64 v[6:7], v[2:3], v[2:3]
+; GFX8-NEXT: flat_load_dwordx2 v[6:7], v[0:1]
; GFX8-NEXT: s_mov_b64 s[4:5], 0
; GFX8-NEXT: .LBB7_1: ; %atomicrmw.start
; GFX8-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_max_f64 v[2:3], v[4:5], v[4:5]
-; GFX8-NEXT: v_max_f64 v[2:3], v[2:3], v[6:7]
-; GFX8-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc
+; GFX8-NEXT: v_max_f64 v[4:5], v[6:7], v[6:7]
+; GFX8-NEXT: v_max_f64 v[8:9], v[2:3], v[2:3]
+; GFX8-NEXT: v_max_f64 v[4:5], v[4:5], v[8:9]
+; GFX8-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[0:1], v[4:7] glc
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: buffer_wbinvl1
-; GFX8-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
-; GFX8-NEXT: v_mov_b32_e32 v5, v3
+; GFX8-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[6:7]
+; GFX8-NEXT: v_mov_b32_e32 v7, v5
; GFX8-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
-; GFX8-NEXT: v_mov_b32_e32 v4, v2
+; GFX8-NEXT: v_mov_b32_e32 v6, v4
; GFX8-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GFX8-NEXT: s_cbranch_execnz .LBB7_1
; GFX8-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -925,13 +925,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX942-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX942-NEXT: flat_load_dword v3, v[0:1]
; GFX942-NEXT: s_mov_b64 s[0:1], 0
-; GFX942-NEXT: v_max_f32_e32 v2, v2, v2
; GFX942-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX942-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX942-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX942-NEXT: v_mov_b32_e32 v5, v3
-; GFX942-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX942-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX942-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX942-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX942-NEXT: v_max_f32_e32 v4, v4, v3
; GFX942-NEXT: buffer_wbl2 sc1
; GFX942-NEXT: flat_atomic_cmpswap v3, v[0:1], v[4:5] sc0
; GFX942-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -970,13 +970,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_load_dword v3, v[0:1]
; GFX90A-NEXT: s_mov_b64 s[4:5], 0
-; GFX90A-NEXT: v_max_f32_e32 v2, v2, v2
; GFX90A-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v5, v3
-; GFX90A-NEXT: v_max_f32_e32 v3, v5, v5
-; GFX90A-NEXT: v_max_f32_e32 v4, v3, v2
+; GFX90A-NEXT: v_max_f32_e32 v3, v2, v2
+; GFX90A-NEXT: v_max_f32_e32 v4, v5, v5
+; GFX90A-NEXT: v_max_f32_e32 v4, v4, v3
; GFX90A-NEXT: flat_atomic_cmpswap v3, v[0:1], v[4:5] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1
@@ -994,13 +994,13 @@ define float @flat_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(ptr
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: flat_load_dword v3, v[0:1]
; GFX908-NEXT: s_mov_b64 s[4:5], 0
-; GFX908-NEXT: v_max_f32_e32 v2, v2, v2
; GFX908-NEXT: .LBB8_1: ; %atomicrmw.start
; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_mov_b32_e32 v4, v3
-; GFX908-NEXT: v_max_f32_e32 v3, v4, v4
-; GFX908-NEXT: v_max_f32_e32 v...
[truncated]
|
845a9d2
to
00d441b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to get #120690 in to stop recomputing this so many times
; CHECK-NEXT: .LBB0_1: ; %bb25 | ||
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1 | ||
; CHECK-NEXT: s_and_b64 vcc, exec, s[0:1] | ||
; CHECK-NEXT: s_andn2_b64 vcc, exec, s[0:1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good improvement we probably should have managed without licm
; GFX7LESS-NEXT: v_mov_b32_e32 v5, v3 | ||
; GFX7LESS-NEXT: v_mov_b32_e32 v4, v2 | ||
; GFX7LESS-NEXT: buffer_atomic_cmpswap v[4:5], off, s[4:7], 0 glc | ||
; GFX7LESS-NEXT: v_mul_f32_e64 v2, 1.0, s9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these tests changes look like it's now not hoisting out of a loop in cases that shouldn't be pressure constrained
Duplicated patch of b4e17d4 |
There are two API of getRegPressureSetLimit() in backend. One is provided by
TargetRegisterInfo which return the RegPressureSetLimit that is determined by
specific target without considering the reserved registers. The other is provided
by RegisterClassInfo which is based on TargetRegisterInfo::getRegPressureSetLimit
and is adjusted dynamically for reserved registers.
Most backend pass (e.g., scheduler) use TargetRegisterInfo::getRegPressureSetLimit.
However MachineLICM still use TargetRegisterInfo::getRegPressureSetLimit which is
not accurate.
This patch replaces the TargetRegisterInfo::getRegPressureSetLimit with
TargetRegisterInfo::getRegPressureSetLimit in MachineLICM pass.